Grammar based statistical MT on HadoopAn end-to-end toolkit for large scale PSCFG based MT
نویسندگان
چکیده
This paper describes the open-source Syntax Augmented Machine Translation (SAMT) on Hadoop toolkit—an end-to-end grammar based machine statistical machine translation framework running on the Hadoop implementation of the MapReduce programming model. We present the underlying methodology of the SAMT approach with detailed instructions that describe how to use the toolkit to build grammar based systems for large scale translation tasks.
منابع مشابه
A Systematic Comparison of Phrase-Based, Hierarchical and Syntax-Augmented Statistical MT
Probabilistic synchronous context-free grammar (PSCFG) translation models define weighted transduction rules that represent translation and reordering operations via nonterminal symbols. In this work, we investigate the source of the improvements in translation quality reported when using two PSCFG translation models (hierarchical and syntax-augmented), when extending a state-of-the-art phraseb...
متن کاملNew Parameterizations and Features for PSCFG-Based Machine Translation
We propose several improvements to the hierarchical phrase-based MT model of Chiang (2005) and its syntax-based extension by Zollmann and Venugopal (2006). We add a source-span variance model that, for each rule utilized in a probabilistic synchronous context-free grammar (PSCFG) derivation, gives a confidence estimate in the rule based on the number of source words spanned by the rule and its ...
متن کاملGappy Pattern Matching on GPUs for On-Demand Extraction of Hierarchical Translation Grammars
Grammars for machine translation can be materialized on demand by finding source phrases in an indexed parallel corpus and extracting their translations. This approach is limited in practical applications by the computational expense of online lookup and extraction. For phrase-based models, recent work has shown that on-demand grammar extraction can be greatly accelerated by parallelization on ...
متن کاملAn Efficient Two-Pass Approach to Synchronous-CFG Driven Statistical MT
We present an efficient, novel two-pass approach to mitigate the computational impact resulting from online intersection of an n-gram language model (LM) and a probabilistic synchronous context-free grammar (PSCFG) for statistical machine translation. In first pass CYK-style decoding, we consider first-best chart item approximations, generating a hypergraph of sentence spanning target language ...
متن کاملMachine Translation Strategies: A Comparison of F-Structure Transfer and Semantically Based Interlingua
Two machine translation (MT) systems which respectively utilize the transfer and interlingua strategies will be presented and compared, emphasizing design principles. Feature structures and unification-based grammar are common denominators for the two MT systems; in particular, both make use of Lexical-Functional Grammar (LFG). In the transfer system. Machine Translation Toolkit, developed by E...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Prague Bull. Math. Linguistics
دوره 91 شماره
صفحات -
تاریخ انتشار 2009